Prosodic Alternative Units in a Mandarin Chinese Speech Synthesizer
نویسندگان
چکیده
The Mandarin Chinese synthesis component of the Dresden Speech Synthesizer DreSS is based on an inventory of syllabic units. The inventory contains all Chinese syllables with the possible tones in up to three phonetic variations for a correct modeling of the cross syllable coarticulation effects. In order to improve the naturalness and fluency of the synthesized speech, the inventory was complemented with prosodic alternative units for non-accented syllables, especially for neutral tone particles. In this paper, two strategies of the generation of such units are compared – the extraction from specially constructed carrier sentences and the extraction from read speech corpus of newspapers texts. The results of a listening test show the best performance for the units from carrier sentences.
منابع مشابه
An RNN-based prosodic information synthesizer for Mandarin text-to-speech
A new RNN-based prosodic information synthesizer for Mandarin Chinese text-to-speech (TTS) is proposed in this paper. Its four-layer recurrent neural network (RNN) generates prosodic information such as syllable pitch contours, syllable energy levels, syllable initial and final durations, as well as intersyllable pause durations. The input layer and first hidden layer operate with a word-synchr...
متن کاملSelection of waveform units for corpus-based Mandarin speech synthesis based on decision trees and prosodic modification costs
A lazy decision tree approach is described in this paper for the selection of concatenative units for Mandarin speech synthesis. The concept is not to induce a concise hypothesis from a given training data; the selection is delayed until a test instance is given. Thus we can construct the “best” decision tree for each selection. The selection of waveform units is guided with other simultaneousl...
متن کاملA set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese
This paper presents a set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese. A large speech corpus produced by a single speaker is used, and the speech output is synthesized from waveform units of variable lengths, with desired linguistic properties, retrieved from this corpus. Detailed methodologies were developed for designing “phonetically rich” and “prosodically ric...
متن کاملRecognizing Mandarin Chinese Fluent Speech Using Prosody Information—an Initial Investigation
The aim of the present paper is to demonstrate how prosody information could be used to recognize Mandarin Chinese fluent speech and what the recognized results imply. By applying our hierarchical prosody framework for fluent speech [1, 2] that specifies boundary breaks and boundary information across phrases and group phrases into speech paragraphs, we were able to develop software that automa...
متن کاملA New Model-Based Mandarin-Speech Coding System
In this paper, a new model-based Mandarin-speech coding system is proposed. It employs a prosody-enriched ASR with a hierarchical prosodic model (HPM) to generate from the input speech enriched transcriptions, including linguistic features, prosodic tags and spectral parameters in the encoder. By sending these features to the decoder, we can first reconstruct the prosodic-acoustic features of s...
متن کامل